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Abstract. Planning marketing mix strategics requires retailers to undcistand within- as 
well as cross-category demand effects. Most retailers carry products in a large variety of 
categories, leading to a high number of such demand effects to be estimated. At the same 
t ime, we do not expect cross-category effects between all categories. This paper outlines 
a methodology to estimate a parsimonious product category network without prior con¬ 
straints on its structure. To do so. sparse estimation of the Vector AutoRegressive Market 
Response Model is presented. We find that cross-category effects go beyond substitutes and 
complements, and that categories have asymmetric roles in the product category network. 
Destination categories art* most influential for other product categories, while convenience 
and occasional categories are most responsive. Routine categories are moderately influential 
and moderately responsive. 
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1 Introduction 


Whilr wit bin-category demand effects of the marketing mix have been studied extensively. 
Cro»-catcgorv elfe< Is are less well understood i ]Locllmig and Selva} [2012[ l. Never!lieless. cross- 
category effects might be substantial. Some categories are complements. e.g. bacon aud eggs 
studied by |.\’iraj et nl.| | |200S} l ur cake mix and cake frosting Studied by |Miuielnuula et «d. 
l ]1999[ |. while others are substitutes, e.g. frozen, refrigerated and shelf-stable jukes l ]Wedel| 
[and Zhaiigl |2001[ i. Hut cross-effects also exist among categories that are not complements 
or substitutes for several reasons. First, as a result of brand extensions, brands are no 
longer limited to one category | ]Krdcm[ 11)98} |Kamkuta and Kaiig} |-007| |.Ma et <d.| [2012} . 
So advertising mid promotion of a brand within one category might spill over to own brand 
sales in other categories. Second, advertising and promotions generate more store trallic mid 
therefore more sales in other eategories i jBcll et af} |19fKS[ |. And third, lower expenditures in 
one eategory alleviate the budget constraint such that consumers are able to spend more on 
other, seemingly unrelated, c ategories l ]S'ong and C’hintam»nt.i} |2007| |Lce et aTj |2UKty . 

While cross-category effects might be substantial for these reasons, we do not ex|»ect that 
each category's marketing mix variables influence each aud every other category. Instead, 
we expect some cross-category effects to be zero or very dose to zero but we can not a 
priori exclude them. Therefore, we use mi exploratory modeling approach fur parsimonious 
cstimation of a product eat (gory network. The network allows us to easily identify categories 
that are iiilluent ial for or responsive to changes in other categories. Building on a widely used 
category typology of destination, routine, occasional and convenience categories | ]Blattl>crg] 
|ct. al.[ [19951 llltiesch et al.| [201.'ty . we find th.it destination categories are must influential, 
convenience ami occasional categories most responsive, and routine categories moderately 
influential and moderately responsive. 

In Older to estimate the cross-category network, this paper presents sparse estimation of 
th<’ Vector AutoRegresnve (VAR) model. The estimation is sparse in the sense that some of 
the wlthin-and cross-category effects in the model can be estimated as exactly zero. Initiated 
by the w or k of |Bughcstmii| (1991) ami |Dekimpe anil Hanssena] ( |l995) , the VAR Market 
Response Model has becuine a standard, llexible tool to measure own- and cmss-eilecls of 
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marketing actions in a competitive environment. The main drawback of the VAR model is 
the risk of ovcrparametrization because the number of |>armueters increases quadratic-ally 
with the number of included categories. Earlier studies using the VAR model, like e.g. INii 


(pTaTj pCXFTt [2TKi7> : |Panwvls et iiT| (2002) : |Srinivas.m ei~nT] I pOO] |2004) ; |Stcenkamp eTaT 
), were often limited by this uwrpununelrLcatiun problem. To overcome this problem. 


previous research on cross-category effects lias limited its attention to a small number of 
categories by studying substitutes or complements l |Kainkura and Kahg| |2tK>7( (Song iuui| 
|C’hintagimta} [2C107HUvllang et a!4 [2008)|I3ruiilvoim<lhy,iy(|20(>f)| (Ma «-t al.||2012} . We present 
an estimation technique for cross-category effects in much larger product category networks. 
The technique allows many parameters to be estimated even with short observation periods. 
Short observation periods are eouununplace in marketing practice since many lirms discard 
data that are older than one year | ]Ludisli and Miluj |2f)l)7) . 

This paper contributes to the extant retail literature in a number of important ways. 
(1) Previous cross-category literature largely limits attention to categories that are directly 
related through substitution, complementarity or brand extensions. We provide evidence 
that cross-category effects go beyond such directlv related categories. (2) We introduce the 
concepts of inlluenee andresponsiveness uf a product categpiy and position different category 
types (ilest inatiuii. mutiue. occasional and convenience) according to Lhese dimensions. (3) 
To identify the cross-category effects, we estimate a large VAR model using an extension of 
the lasso approach uf |Til>shiiaiii| | |llffl(>[ |. 

The remainder of this article is organized as follows. Section 2 |xxdtions this paper 
in the cross-category management literature imd describes the conceptual framework that 
positions category types according to their iniluruce and responsiveness. Section 3 discusses 
the methodology. YVe describe the sparse estimator of the VAR model, discuss how to 
construct impulse response functions oils compare the sparse estimation technique with twit 
Uavesian estimators. In Section 4, a simulation study shows the excellent performanee or the 
proposed methodology in terms of estimation reliability and prediction accuracy. Section 5 
presents our data and model. Section 6 our findings on cross-category demand effects. We 
first identify which categories ore most influential and which are most responsive to changes 
in other categories. Then, we identify the main cross-category effects based on estimated 
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crust-price. promotion and sales elasticities. 


2 Cross-Category Management 


The importance uf category management lor retailers Is widely acknowledged, both as a 
marketing loul fur category performance t ]l’hdcr and Lodisli[ |lf)!H>| |Basnroy el al.[ |20<)1[ 
|Dhar et aT] [ 200 1 [ > mid as an operational tool for planning and logistics i jHajagupalan and| 
[Xia } |2012) . Successful category inanagement requires retailers to understand croaa-categorv 
effects of prices, promotions and sales. Among these, the cross-category effects of prices 
on sales which define sulislitutes and complements are the most extensively studied 
i jSong .uni Cluntagmita[ |2(ll)6[ |Bundyopndhyav[ |2(ll)*)[ |Leellnng and Selva] [2012| |Sinistvu[ 
|2ul2j i. Cross-category elfects of promotions, e.g. feature .uni display promotions, on soles 
result from many brands being active in multiple eategorhs ^Krdeni and Snn[|2002ft . Brand 
associations carry over to products of the same brand in other categories, e.g. through 
umbrella braitdiug |Erdem[ |HH)S[ i or horixontul product line extensions | |Aaker and Keller[ 
|19f)0ft . Less well understood than the effects of prices and promotions, are the elfects of 
s.des in one category oil sales in other categories. Such effects might exist, because categories 
are relat ed based on affinity in eonsuinplion i jSlianluir anil K.uinao]|2Ql-l} , because products 
from various categories are placed close to each other in the shelves i jlte/awada et al.| |2009[ 
[Shantou and Kannan]|201-lft . or because of the budget constraint l ]Du and Kamakura] |20l>8} . 
If consumers spend more in a certain category they might, all else equal, spend less in other 
categories simply because t hey hit their budget constraint. As a result, cross-category effects 
might exist between seemingly unrelated categories. 

When studying these cross-category effects of price, promotion and sales on sales, several 
asymmetries might arise. A lirst asymmetry concerns within- versus cross-category effects. 
We expect wit bin-category effects to be more prevalent and larger in size than cros«-eategory 
effects (e.g. |Song and Chintagunta] 2l>0(ij |Be/.awada et al.[[2<HK)[ l. A second asymmetry con¬ 
cerns category induence versus category rcsponsiveni'Ks. Inlluential categories are import imt 
drivers of other category's sales, while sales of responsive categories react to changes in other 
categories. To identify which categories arc more inlluential or more responsive, we build on 
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widely used ly|»ology of categories described in jl tlattberg et .il l l ]ll)*)5p . 
Bintt berg el al. 


deGtie -I category types from the consumer perspective: des- 
tinntiou, routine, occasional and convenience. Destination categories contain goods that 
consumers plan to buy before they go on a shopping trip, such as soft drinks. [Britsih et ,il.| 
i ]2l)i:ij l show that destination categories are generally categories in which consumers spend 
a lot of their budget . Retailors typically use a price aggressive promotion strategy and high 
promotion intensity for these destination categories with the goal of increasing store Irallic. 
Because consumcis shop to buy products in the destination categories, destination categories 
are likely to influence sales in other categories. However, since consumers already plan to 
buy in the destination categories before entering the store, destination category sales will 
not be highly responsive ( )Shankar ami Kaiiimiij | 2 I> 1 - 1 [ |. 

About 55% to 60% of categories are routine categories | ]l*r.nlU.m^ |2I>09( |. Routine cate¬ 


gories are regularly and routinely purchased, such as juices and biscuits. Retailers typically 
use a consistent pri< mg strategy and average level of promotion intensity. Because purcliasas 
in routine categories can more easily be delayed than purchases in destination categories, we 
expect routine categories to be more responsive. But, since purchases in routine categories 
altogether still account fur a large portion of the budget, they are also likely to influence 
sales in other categories. 

Occasional categories follow a seasonal pattern or are purchased infrequently. These 
categories comprise a small proportion ul retail expenditures while they contain typically 
more expensive items, like oatmeal. We therefore expect occasional categories to be less 
influential and more responsive than destination or routine categories. 

Finally, convenience cat egories are categories that consumers End convenient to pic k 
up during their one-slop shopping trip, like ready-lo-eat-meals. These purchase decisions 
arc typically made in the slure. Since convenience categories are geared towards consumer 
convenience and Oiling impulse needs, we expect them to be highly responsive. 
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3 Sparse Vector Auto-Regressive Modeling 


3.1 Motivation 


The aim of i bis paper is to identify cruss-cutegory demand effects in a large product category 
network- To this end. wo use the Vector AutoRegressive (VAR) model. The VAR is ideal for 
measuring within- and ( row-category effects of marketing actions since it accounts for both 
inertia in marketing spending and performance feedback effects by treating marketing vari¬ 
ables as endogenous ( jPekiiupc .mil Hnn>.M’iia| [17)%) - Other studies on ■ ross-eategory effects, 
like e.g. jWedel and Zhang] ( ]2<l(l-l] l use a demand model with exogenous prices, or a simul¬ 
taneous equations model without lagged effects lilu- [.Sbankar mid Kaiinan| ffill-lfr . Howevvr. 
managers may set marketing instruments strategically m response to market performance 
and market response expectations. Not accounting for time inertia or feedback effects lim¬ 
its our understanding of how the market functions and misleads managerial insights and 
prediction. 

Identifying eruss-cnlegury demand effects using VAR analysis remains challenging because 
the sheer uiunbcr of such effects makes them hard to estimate. The number of parameters 
to be estimated in the VAR rapidly explodes, making standard estimation inaccurate. This 
undermines the ability to identify important relationships in the data. To overcome an 
explosion of the ntiml>er of parameters in the VAR. marketing researchers have used pre- 
CBtunalion dimension reduction techniques, i.c. they lirst impose restrictions on the model 
and then estimate the reduced model. Four such common techniques are (i) treating mar¬ 
keting variables as exogenous (e.g. |Nijs el abj |2001| | Pan we Is cl al.[ |2IK)2| and Nijs el 
|2C)07ft , (ii) estimating submodels rather than a hill model (e.g. [Srinivasan et al)[201X)| [Srini- 
[vasan et aT||201>-4^ . (iii) aggregating or pooling over, for instance, stores or competitors (e.g. 
|Horvalh et al.[ [2<105[ |Slotegraaf and Pamvels] |20l)8f . and liv) applying Least Squares to 
a restricted model (e.g. jPekimpe and llaie-sens( |lfK). r i[ |Dekimpe et al,| |lfM)9( |Nijs et af) 
|2<107fr . Most researchers applying pre-esliniatiou dimension reduction techniques recognize 
that they do so because of the practical limit ations of standard estimation techniques rather 
than for theoretical reasons (e.g. |Srinivasan et al.[ [ 2 ( 10 - 1 | and |Bnndyopadlivay[ | 20 iW) . 

To address the overcame! rixat ion of the VAR. we use sparse estimation. Sparsity means 












































































that some of the within- and cross-category effects in the VAR are estimated as exactly 
zero. As argued in t he previous section, from a substantive perspective, we cannot exclude 
croaa-catugory effects before estimation because cruss-category effects might occur between 
seemingly unrelated categories. From a methodological perspective, sparse estimation is a 
powerful solution to handle the uveqiarumctrizalion of the VAR. In our cross-category model, 
we endogenously model sales, promotion .mil prices of 17 product categories, lienee, already 
in a VAR model with one lag. as much as (3 x 17) x (3 x 17) - 2601 within- and cross- 
category effects need to be estimated. Since the sparse estimation procedure puts some of 
these effects to zero, a more parsimonious model is obtuined. Results are easier to interpret 
and, therefore, the sparse estimation procedure provides actionable insights to managers. 


3.2 Extending the Lasso to the VAR model 


In situations where the number of parameters to estimate is large relative to the sample size, 
the Lasso proposed by | Tibsbirani| | |199(1{ i provides a solution within the multiple regression 
model. The Lasso minimizes the least squares criterion penalized for the sum of the alisulutc 
values of the regression parameters. This penalization forces some of the estimated regression 
coefficients to be exactly zero, which results in selection of the pertinent variables in the 
model. The Lasso method is well established i [Bnhlmann and van do (ieer] [2f)l 1) |('hattcrjccj 
[and Lahin] [21)1 l[ l and shows good performance in various applied fields ftVu et al.[ |2IHW) 
|Fan et al.[ [2011ft . 

The Lasso technique can not be directly applied to the VAR model because the VAR 
model differs from a multiple regression model in two important aspects. First, a VAR model 
contains several equations, corresponding to a multivariate regression model. Correlations 
between the error terms of the different equations need to be taken into account. Second, 
a VAR model is dynamic, containing logged versions of the same time series as right-hand 
side variables of the regression equation. Both aspects of VAR models make it necessary to 
extend the laato to the VAR context., what the sparse estimator in this paper docs. 

It builds further on a sparse estimator of the multivariate regression model | |Ruthmaii 
[el al.[ [201 Oft. and the groupwisc lasso for categorical variables i ]Vnan mid l.in[ |2006[ [Meier] 
[<•1 aI.)[TOsft . The estimator is consistent for the unknown model parameters, scc |Meier et al. 
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3.3 Model Specification 

Saleh, price oml promotion are measured for several categories over .1 certain time period. 
We collect .dl these time scries in a mull iv.iriate time series y, with 7 components. Iu our 
cross-category demand effects study, yi contains sales, price and promotion for 17 product 
categories, hence 7 3 » 17 51. The VAlt Market Response Model is given by 

y, = U,y,_ 1 + B 2 y, *+...+ B p y, + e,. (1) 

where p is the lag length. The autoregressive parameters B, to B p are ( 7 x 7 ) matrices, which 
capture both within- and cross-category effects. The elements of these matrices measure the 
effect of sales, price and promotion in one category on the sales, price and promot ion in other 
categories (including its own). The error term u, is assumed to follow a iV,(0, £) dist ribution. 
We a»mme. without loss of generality, that all time scries are mean centered such that no 
intercept is included. 

II' till- number of components 7 iu the mult iv.iriate time series is large, the number of 

unknown elements iu the sequence til' matrices IS, . B p explodes to pt/ 1 , and accurate 

estimation by standard methods is no longer |>ussib!e. -Sparse estimation, with many el¬ 
ements of the matrices IS, . B t , estimated as zero, brings an outcome: it will not only 

provide estimates with smaller mean squared error, but also substantially improve model 
interprelability. The method we propose doc?, nut require the researcher to prosperity which 
entries iu the B t matrices are zeru and which are not. Instead, the estimation and variable 
selection are simultaneously performed. This is particularly uf interest in situations where 
there is no a priori information on which time series is driving which. 

The instantaneous correlations in model |T]| are captured in tin 1 error covariance matrix 
SI If the dimension 7 is large relative I o the number of olwervat ions, eat dilution of SI becomes 
problematic. The estimated covariance mat rix risks get ting singular, i.r. its inverse does not 
exist. Hence, we also induce sparsity iu the estimation of the inverse error covariance matrix 
il II The elements of il lmve a natural interpretation as piutiid eorrelations between 
the error components of the 7 equations in model i[TJ. If the ij-th element of the inverse 
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covariaucu matrix is zero this incmus that, conditional on the other error terms, there is no 
correlation between the error terms of eqnutions i .uni 

3.4 Penalized Likelihood Estimation 

This section defines the sparse estimation procedure for the YAK model. The Sparse VAR 
estimator is delined by minimbdug a measure of goodness-of-fit to t he data combined with 
a penalty for the magnitude of the model parameters. It Is convenient to Itrst recast model 
iJTJi in slacked form as 

y = X/3 + e, (2) 

where y is a vector of length nq containing the stacked values of the time series. If the 
multivariate time series has length T. then n T p Ls the number or time |n»ints for which all 
current and lagged observations are available. The vector :l contains t he stacked vectorized 

mat rins B, .mid r the vector of sl acked error terms. The matrix A* A, 0 A'„. with 

.Yi> = (Y,.Y^). is of dimension (ny x ptf). Herr _Y, is an (n x i/| matrix, containing the 

values of the y series at lag j m its columns, for 1 < j < p. with p the maximum lag. The 
symbol ® stands for the Kronecker product. 

The sparse estimator of the autoregressive parameters 3 and the inverse covariance ma¬ 
trix J> V 1 are obtained by minimizing the negative lug likelihood with a groupwise 
penalization on the 3 .uni u penalization on the off-diagonal elements of 1>: 

= argmin •~(y Xfi)'fl(v - Xfi) - log |fl| + A. f) ||0J| + A, £ |fl*l. (3) 

<* n > " 7TT 

where ;|«|| ; uf) IJ is the Euclidean norm and il By simultaneously 

estimating 3 and il. we lake the correlation structure between t he error terms into account. 
The? vector 3 a in l[l{} is a suhveetor of 3, containing the regression coelBc ients for the lugged 
values of the* same time series in one of the y expiations in model jTj. The coefficients of the* 
lagged values of the same time series form a group. The total number of grou|«s Is Cl - r/ J 
because there are y groups within each of the y equations. Tlio penalty on the regression 
coefficients enforces that either all elements of the group 3 V are zero or none. As a result, 
we take the elynaink* nature of the VAR model into account since the estimated 13, matrices. 


!) 





lor j 1. ii. have their zero elements in exactly the same cells. The penalization on the 

off-diHgouul elements of ft induces sparsity in the estimate 11. Finally, the scalars A| and A. 
control the degree of sparsity of the regression estimator and the inverse covariance matrix 
estimator, respectively. The larger these values, the more sparsity is imposed. Details on 
the algorithm to perform penalised likelihood estimation and the selection of the sparsity 
parameters A, .uid A. con be found in Appendix A. 

Our approach Ls similar to |Hsu et al~| q2()0Sft who use the Lasso within a VAR context. 
However, they do not account for the group-structure in the VAR model, nor do they im¬ 
pose sparsity on the error covariance matrix. |Davis et aT 1 ) 201 propose another sparse 
cstirnation procedure fur the VAR. They infer the spursitv structure of the autoregressive 
parameters from an estimate of the partial speitnd eoherenee using a two-step procedure. 
Since variable selection Ls performed prior to model estimation. the resulting estimator .suiters 
from pre-testing bias. Moreover, the number of parameters might still approach the sample 
size, leading to unstable estimation or even making estimation infeasible if the number of 
parameters still exceed* the sample size. Sparse estimation m c<ouoniies is a growing field, 
Fmi el nl] | ]2011[ i anil references therein for an overview. 


see 


3.5 Alternative: Bayesian Estimators 

An alt ernative to t he sparse estimation technique is tu impose prior information in a Bayesian 
setting. Bayesian regularization techniques have been proposed for the VAR model in [Lit-| 
and are used in various applied fields such its mac rommoniics i jCicfang) | 201-l[ 


ermaii 


[Banbiira et ul.j |20lT>) . limuice l |C’arricro et al.[ |2012[ and marketing ( jLenk and Orme| |2tHI9) 
Hurvath ami Fuk[[2013| |B«uidyupadhyny[ |2000fr . They are also applicable to a situation like 
ours where there are many parameters la be estimated with a limited observation period, 
and are thus a good bendimark. However, these methods are not. sparse, they do not per¬ 
ioral variable select ion simultaneously with model estimation. The following two parugraplis 
elaborate on two Bayesian estimators which serve as nou-sparse alternatives. 


Minnesota Prior . The original Minnesota prior only speeiliis a prior dist ribution fur the 
regression parameters uf the VAR model. The error covariance matrix £ is assumed to lie 
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diagonal, and estimated by - of with of tin- standard OLS estimate of the error variance 
in an AR</i) model for the i ,h lime series | ]Koop mid Kotohiiis} pCIOOfr . The prior distribution 
of tin- regression parameters Is taken to be multivariate normal: 


Lu)- 


(-i> 


For t he prior mean, the common choice is -- 0/, g fur stationary series. The prior covari¬ 
ance matrix \_ w Is diagonal. The posterior distribution Is again multivariate normal. Full 
technical details can be found in |Koop and Korobilis| )2<1(1*)^ . 

The main advantage of the Minnesota prior is its ease of implementation, since posterior 
inference only involves the multivariate normal distribution. However, imposing the Min¬ 
nesota prior only ensures that the parameter estimates are thrunken Iowan Is zero, while the 
Sparse VAR ensures that some parameters will be estimated as exactly zero. 

Normal Inverted llWmrf I'nnr. The Minnesota prior tab’s the error coviuianie matrix II 
as fixed and diagonal and, hence, not as an unknown parameter. To overcome this problem, 
|Hanbura el al.| ( pOlO} i impose an inverse Wishart prior on the matrix. More precisely. 


3 | £ 


*<£*/.*" d E 


•WiS,,.,*), (5) 

where d v/1( . IV S a and i\, are hyperparameters. Under this normal inverted Wishart priur 
(labeled in the remainder of this paper as “NIVV"), the posterior fur £/, conditional on II 
Is normal. and the posterior fur H is again inverted Wishart. Full technical det.iils can be 
found iii |Banbura et rtl.[ 


3.6 Impulse Response Functions 

Impulse response functions (IRFs) are extensively used to assess the dynamic effect of ex¬ 
ternal shucks to the system such as changes in the marketing mix. An 1HF pictures how 
a change to a certain variable at moment / impacts the value of any other time series at 
time t t A*. accounting for interrelations with all other variables. The magnitude of the effect 
is plotlid as a function of A\ An extensive discussion on the interpretation of tin 1 IRF in 
marketing modeling can be found ill |Dekhnpc and Han*tcns| l [lf)05[ l. We use IRFs to gain 
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Insight ia the dynamics of within and miss-category sales, promotion and price effects on 
each of the 17 product category sales. The I HP's are easily computed as a function of the 
Sparse YAH estimator [see jHamilton} |lOI)lf . Since we want to account for correlated error 
terms, we use generalized I HP's l ]l > i’saran ami Sliin[ |199S[ |Dekimpe mid Han.v>en.s[ |10f>9( |. 

To ohtaiu confidence bounds fur the generalized I HP's estimated by Sparse V'AH. we 
use a residual parametric bootstrap procedure ( )C'hutt.eijee mid Laliiri[ |2011) . We generate 
jVb 1(MK) time series of length T from the V'AH model J2). The invertible estimate of E 
<lelivered by the Sparse VAR estimation procedure is needed to draw random uuiulrers for 
the .Y y (0. E) error distribution. Pbr each of these A't, nndtiple time series, the estimates of the 
regression parameters mo computed. We compute the covariance matrix uf the .V A bootstrap 
replicates. For each of the \ generated serits impulse response functions .ue computed; the 
90% confidence bounds are then obtained bv taking the 5% and 95% percentiles. 


4 Estimation and prediction performance 

We conduct a simulation study to compare the proposed Sparse V’AH with Bayesian methods 
using tin 1 Minnesota and N1W priur. As benchmarks, we include the classical Least Squares 
(LS) estimator and two restricted versions of LS which are often used in practice. In the 
1-step Restricted LS { ]Dekimpc and Hanssons [ |1995[ | Dekunpc et ah] 


. we estimate the 


model with classical LS. delete all variables with ;/-statistic! < 1. and re-estimate the model 
with the remaining variables. We also consider an iterative Restricted LS method described 
in }Lnl kepuhl ami Kratzig| < ]2(10l^ where wc lit the full model using LS and sequentially 
eliminate the variables leading to the largest reduction uf Bit’ until no furt her improvement 
is possible, of which a eluse variant was use'll by INijs et al. 




We simulate from a VAR model with </ lfl dimensions and /> 2 lags. Each time series 

has an own auto-regressive structure and we include system dynamics among the different 
series. The first series leads series two to live, while the sixth series leads time series 7 to 10. 
specifically, the data generating processes are given by 


y. = 


b, o 

o B, 


y, i + 


B> <1 
0 B-2 


yi -a + e, 
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In total, there are /«/* = 200 regression parameters to be estimated with 36 true parame¬ 
ter values different from zero. The 10-dimensional error term e, is drawn from a multivariate 
normal with mean zero and covariance matrix E = 0.1 /,,,. We generate .V, = 1000 multi¬ 
variate time series of length 50 according to the above simulation scheme. 


4.1 Performance measures 


We evaluate the different estimators in terms of (i) estimation accuracy, (ii) sparsity recog¬ 
nition performance, and (iii) forecast performance. 

To evaluate estimation accuracy, we compute the mean absolute estimation error (MAEE), 
averaged over the simulation runs and over the 200 parameter 

iV. 


MAEE = ±-J- 1^0 - b *i\> 


where i/' U) is the estimate of l)„ r the kl t, ‘ element of the matrix B ) corresponding to lag j, 
for the a ,h simulation run. 

Concerning sparsity recognition, we compute the true positive rate and true negative rate 

tdo/J: ,,, _ #{(*, = ki } * 0 and b u , ± 0 } 

6 u, * 0} 

TNRfS 6) = giMij 

#{(MJ>:6«,= 0} 

The true positive rate (TPR) gives an indication on the number of true relevant regression 
parameters detected by the estimation procedure. The true negative rate (TNR) measures 
the hit rate of detecting a true zero regression parameter. Berth should 1 m* as lar ge as possible. 

Finally, we conduct art out-of-sample rolling window forecasting exercise. Using the same 
simulation design as before, we generate multivariate lime series of length T = 60. and use 
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Table 1: Mean Absolute Estimation Error (MAKE). True Positive Hale (TPR), True Nega¬ 
tive Hate (TNR) and Mean Absolute Forecast Error (MAKE), averaged over 1000 simulation 
rums, are reported for every method. 


Method 

MAEE 

TPR 

TNR 

MAKE 

Sparse VAR 

0.041 

0.S60 

0.848 

0.359 

LS 

0.157 

1 

0 

0.5-10 

Restricted LS: 1-step 

0.121 

0.700 

0.541 

0.520 

Restricted LS: Iterative 

0.116 

0.261 

0.775 

0.516 

Buyesian: Minnesota 

0.044 

1 

0 

0.355 

Bayesian: NIYV 

0.077 

1 

0 

0.476 


a rolling window of length 5' = 50. For all estimation methods, 1-step-ahead forecasts 
are eompuled for I S',....7' 1. Next, we compute the Mean Absolute Forecast Error 

(MAKE), averaged over all time series and across time 

makk —L-iyJy; 6 « - m 


l-s i-l 


where t is the value of the i ,h time series at time I *- 1. 


,-lA 


4.2 Results 

Tabh Q] presents the performance measures of the Sparse VAR. the Buyesian and benchmark 
methods. The Sparse VAR estimator performs last in terms of estimation accuracy. It 
attains the lowest value of the MAEE (0.041). A paired Meat coulirms that the Sparse VAR 
significantly outperforms the other methods full /*-values < 0 . 001 ). 

Sparsity recognition performance is evaluated using the true positive rate and the true 
negative rate, reported in Table[1] Kor the LS and Bayesian estimators, all parameters are 
estimated as non-zero, resulting in a perfect true positive rate and zero true negative rale. 
Among the variable selection methods, the Sparse VAR performs best. Sparse VAR achieves 
a vidue of the true positive rate uf 0.86; 0.85 for the true negative rate. 

Finally, we evaluate the forecast performance of the different estimators lay the Mean 
Absolute Forecast Error in Table [T] The Sparse VAR and the Buyesian estimator with 
Minnesota prior achieve the best forecast performance. A Diebold-Mariano test confirms that 
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these two methods perform siguilieantlv hot tor thiui the* others (p-valm* < 0.001). There 
Is no significant difference in forecast performance between Sparse VAR unci the Bayesian 
cst inialor with Minnesota prior. 

4.3 Robustness checks 

Alternative penally function. We investigate the robustness ol Sparse VAR to the choice of 
the penalty function. We replace the grouplasso penally on the regression coefficients with 
the elastic net penalty | ]Zou and Hastiet|2(K)5j |. Elastic net is a regularized regression method 
that linearly combines the L t and L, jienaltHw of respectively lasso and ridge regression. 
Like the groitplassu. clastic net produces a sparse rstimulo of the regression coefficients. 
All Ollier steps ol the methodology remain unchanged. We liud that the grouplasso penalty 
performs slight Iv better than the elastic net penalty in terms of estimation accuracy, sparsity 
recognition and prediction performance. 

Sensitivity to the order of the VAR. We estimate the model with Sparse VAR for differ¬ 
ent values of /> and evaluate the performance. As expected, Sparse VAR attains the best 
estimation accuracy for the true value p 2. The results are, however, very robust to the 
choice of the order of the VAR. Selecting p tuo low is slightly worse than selecting p luo 
high. 

Sensitivity to the sparsity parameters. The sparsity parameters are selected according 
to the B1C and tills selection is an integral jmrt of the estimation procedure. The rraulls 
are not sensitive to the value of Aj. which controls the sparsity of 11. The results are more 
sensitive tu the choice of A,, since it directly influences the sparsity of the autoregressive 
parameters, it tunas out that Sparse VAR still outperforms the other estimators fur a large 
range of Ai values. 

5 Data and Model 

We use tin- sparse estimation technique for large VARs described iu Section ?> tu identify 
cross-category demand effects across 17 categories in the Dominick's Finer Foods database. 
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Table 2: Description of the 17 categories from Dominick's Finer Foods database that are 
analyzed in this paper. For each category, we report the proportion of food and drink 
expenditures. 


Category 

ExiM'uditures 

Category 

Expenditures 

Safi Drinks 

22.24% 

Snack Crackers 

3.01% 

Ci'ieals 

13.92% 

FYozeu Juices 

2.88% 

Cheeses 

10.46% 

Canned Tuna 

2.80% 

Refrigerated Juices 

7.36% 

Frozen Dinner* 

2.00% 

Frozen Entrees 

6.98% 

Front-end-candies 

2.00% 

Beer 

6.35% 

Cigarot tes 

1.49% 

Couldcs 

0.21% 

Oatmeal 

1.43% 

Canned Soup 

1.82% 

Crackers 

1.37% 

Bottled Juices 

1.66% 




This database is a well-established source of weekly scanner data from a large Midwestern 
supermarket ihnin, Dominick's Finer Foods (e.g. |K.unknrn ami Kaugj |2007^ |Panwvls| |20(>7| . 
We lirsl describe the data and model in more detail, and then report on the insights the 
Sparse VAR generates in the next section. 

We use all 17 product categories in the Dominick's Finer Foods database containing 
food and drink items, a much broader selection of categories than previous studies on cross- 
calcgorv demand elfccts have considered- A description of each product category can lie 
found in Table[3] For 15 stores, we obtain weekly sales, pricing and promotional feature and 
display data for the 17 product categories. 

Sales. Category salts volumes for the 17 categories, measured in dollars per week. 
Promotion. The promotional data include the percentage of SKlJs of each category that 
are promoted (feature and display) in a given week, following |Srinivasan et al. ]2tK)•![ ). 
Prices. To aggregate pricing data from the SKU level tu the product category level, we 
follow [Siiniv.isan et ,d.| ^2004^ and |l*auwcl> et al. 


in using SKU market shares as 


weights. Prices are not delinted because there Is strung evidence that people are sensitive to 
nominal rather than real price changes l )Shalir ct~akl ll>f)7| l over short time periods. 

We use data from January 1093 to July 1994. 77 weeks in total. We neither use data before 
1003 since they contain missing observations, nor observations after 1001 since|Srinivas,m 


































Table 3: Description uf the 15 data seta. Each data set contains multivariate time series fur 
sales <Y,). promotion (M,) and prices (P,). 

Store Nuuibcr of Dimension 

Time Points Y; Mr P ( Total 
Store 1-15 77 17 16 17 50 


|et al.| | ]20(>4[ l pointed out that manufacturers made extensive use of 'pay-for-perfortuance' 
price promotions as of 1994. which arc not fully reflected in the Dominick's database. This 
data range is short relative to the dimension of the VAR, which calls for a regularization 
approach such as the Sparse VAR. For all stores, we collect data on sales, promotion and 
pricing lor all 17 categories. Only for cigarettes, no promotion variable is included in the 
VAR since none of the SKUs in that category were promoted during the observation period. 

We estimate a separate VAR model lor each store, which allows to evaluate the robustness 
of the Imdmgs. The multivariate time series entering the VAR model are the log-differenced 
sales (Yi), differenced promotion (Mt), and log-differenced prices (P«)Q The dimensions of 
the lime series are represented in Table [3] We use the Vector Auluregrissive model, with 
endogenous promotion mid prices. 


Y,' 

P, 

= B{> + Bi 

Y, 

P, . 

+ . 


Y. p‘ 
P. p 



M, , 



M. p 


Averaged hi runs stores, the selected value of p is two for the Sparse VAR. Alsu fur the 
Bayesian estimators, the lag order of the VAR is selected using the BIC criterion, which Is 
one for the majority of the stores. 

6 Empirical Results 

We focus ou the effects of prices, promotions and salt's in category A on the sales (or demand) 
in category B. where A and B belong to the product category network. We first study the 
'Fiillmviug slauilanl pi.nice, wo first test foi statiuuaiity. A stnlkuuuity lost uf nil mdivulual time soins 
using tlic Augment oil Dickey-Fulle.t lost indicate* that meet time series in levels are Integrated uf urder 1. 
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Table 4: Proportion uf nonzero within .uul cri&s-categpry effects ol price, promotion and 
stiles on sales, averaged across 15 stores and 17 product categories. 



Price 

Promotion 

Sales 

Wit hin-cat egory 

:W 

307. 

%7r 

('rou-cal egory 

m 

21% 

21% 


direct effects. Fur instance, there is no direct effect uf price of A on sales of B if the 

corresponding estimated regression coefficients are equal to zero .it all legs. Then we turn 
to the complete chain of direct and indirect effects using Impulse Response Functions. For 
instance, price in category A indirectly influences sales in category B when the price of 
category A influences tin- price, promotion or soles in a certain other category C which, in 
turn, influences the soles uf category B. Since we work in a time series setting, both direct 
and indirect effects are dynamic in the sense that t he effect occurs wit h a certain delay. 


6.1 A network of product categories 


We analyze cross-categprv demand effects as a network of interlinked product categories of 
which priies. promotions and s*dts in one category have an effect on salts in other categories. 
Recently, network perspectives haw been increasingly used by marketing researchers to 
model, for example, the network Value ala product in a product network iJOestreii her-Singer 
[ft nl.[ -Pl.'l[ or to investigate the flow of influence in a social network | |Zul>csck and Satvatyl 
[2011 P >. In our case, the 17 product categories are the nixies of the network. We «»llimite 
the Sparse VAR For 15 stores separately. If the Sparse VAR estimation results indicate. In- 
giving a uon-zeru estimate, that prices in uue category have a direct influence uii sales in 
another category in the majority of the 15 stores, a directed edge is ilrawu between them. 
The resulting directed network is plotted in Figure [I] Similarly, Figures[2]anil [|]present 
cross-category effects of respectively promotion and sales on s.d.s. If promotion or sales in 
one category directly influence sales in another category, respectively, this is indicated by a 


directed edge. 

A lirst important liuding is that the crottt-cal egory networks are sparse nut each category 
influences each and even- uther category. While the sparse VAR estimation favors zero- 
















/->s 



Figure 1: Cross-category effect network of prices on sales: a directed edge is drawn from 
one category to another if its price influences sales in the oilier cat egory for t he majority of 
stores. 

effects, it does not enforce them. Here, as many as 78% of all estimated effects are zero- 
effects. Talile[I]summarizes the prevalence of within-and eross-catcgary effects. As expected, 
withiu-cHtcgory effects arc more common than cross-category effects. For all categories, past 
wines of tin 1 own category’s sale's are selectcfd for almost all stores. Cross-category effects of 
price on sales (19%), promotion on sales (21%) and sail's on sales (21%) are alnnit equally 
prevalent. 

Next, we focus on category influence and responsiveness in the cross-category network, 
measured by the numher of edges originating from and pointing to a category respectively. 
As discussed in Section 2. destination categories are expected to he more influential, while 
convenience categories are expected to be more respomdve. We discuss which types of cat¬ 
egory's WO find to be most influential and/or responsive in the cross-categOry networks of 
prices on soles, promotion on sail's, and sales on sales. 

The iniibt iulluenlial categories in t he cross-category network of prices on sales are des¬ 
tination categories such as Soft Drinks and Chooses (efr. each fuur outgoing edges in Figure 
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Figure 2: Cross-category effect network of promotions on sales: a directed edge is drawn 
from one category to another if its promotion influences sales in the oilier category for the 
majority of stores. 



Figure 3: Cross-category effect network of sales on sales: a directed edge is drawn from 
one category to another if its salt's influences sales in the other category for the majority of 
stores. 
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. This is consistent with our expectations, .is Soft Drinks is known to he a destination 
category' ( ]13riiscli et al.] pl3| (Shankar and Kannan( pOHi (Hiattlari- et aT| p%) . Soft 
Drinks is ranked tirst and Cheeses third in terms of food and drink expenditures (see Table 
[2]l ami iiri* both heavily promoted by retailers. A price change in either of these categories 
thus strouglv influences the budget constraint, which in turn influent** purchase decisions in 
other categories. In the cross-eategpry network of promotions on sales, Cereals Is the most 
Influential category (efr. live outgoing edges in Figure]?}. [ Hricsdi et al.j |2()13[ l identified 
Cereals as highly ranked among the destination categories. This Ls not surprising as cereals 
are part of daily consuniption patterns and are ranked second in terms of food and drink 
expenditures. In the cross-category eHeels network of sail* on sales in Figure|3| we identify 
again Cheeses .is the most influential culegory. 

We find convenience allegories to be higlily responsive to changes in other categories. 
The most prominent price elfects are observed for Canned Soup (cfr. live incoming edges in 
Figure ]T}; the most prominent promotion effects for Frozen Dinners. Crackers anil Canned 
Soup (cfr. each three* incoming edges in Figure [2j: and tin- most prominent sales effects 
fur Oatmeal .uni Crackers (cfr. each four incoming edges in Figure[jj. These categories are 
typically bought out of convenience, such as Frozen Dinners and Claimed Soup: ur bought on 
occasion, such as Oatmeal and Crackers, counting for a very sm.dl percentage uf food .uid 
drink expenditures (see Table|5}. 

Routine categories such as Bottled .luices. Refrigerated Juices. Frozen Juices and Cookies 
score moderate-to-high on category influence but are also responsive. This is in line with 
our expectation of many grocery categories living routine categories that are moderately 
Influential mid moderately responsive. Finally, the cigarettes cat egory is least responsive mid 
least influential. This finding is nut surprising as cigarettes are addictive, hence, smokers 
probably have a stable consumption unrelated to food and drinks. 

To confirm the robustness of tin* results obtained by S|»arse VAR, we cheek whether 
category respomdvemus and influence arc consistent across stores. We compute Kendalls 
coefficient of coticordance IF for eategory influcmce and responsivene&i calculated from the 
graphs in Figures 2-1 a! the store level. As 11' increases from 0 to 1. there Ls stronger 
consisLency amiss stores. Table [7] ini lie. it»s that all values of Kendall's IF .in- significant 





















Table 5: Kendall's coolfidout of concordance acruss biurea of cross-category effects of price, 
promotion and sale* on sales for both category influence and responsiveness. P-values ure 
indicated between parentheses. 



Price 

Promotion 

Sides 

lufluouce 

0.40 

0.56 

0.30 


(<11.1101) 

(<0.00!) 

(<(1.001) 

Reapourivimiys 

0.30 

0.16 

0.17 


(<0.U01) 

r.D.mn, 

(<(1.001) 


Tuble 6: Size of within mid cross-cat egury effects of price, promotion and sales on sales, 
Stumned across 10 lags of the IRF, averaged across stores and product categories, and in 


absolute value. 


Price 

Promotion 

Sides 

Within-category 

0.004 

0.006 

0.057 

(’ross-eat i?gory 

0.002 

0.005 

0.002 


6.2 Impulse Response Functions 

For each store, we estimate the Sparse VAR and compute the corresponding Impulse Re¬ 
sponse Functions (IRFs). The elTect size of an impulse is obtained by summing the absolute 
values of the responses across the lirst 10 lags of the 1RF. where we take absolute values in 
order nut to average out positive and negative response. Wo compute effect sizes of impulses 
in price, promotion or sales in one product category uu the sides in the same (within) cate¬ 
gory ur another (cross) category. In Table ti. we report the witliin and cross-category price, 
promotion and sales effect sizes, averaged acrom the 15 stores mid the product categories. 

Table [B| indicates that, for example, a one standard deviation price shock loads to an 
accumulated absolute change of .004 in own sales growth over a time period of 10 lags. 
As for the direct effects, we systematic ally find that wit lnn-catcgory effects are larger in 
magnitude limn cross-category effects, especially for sales mid prices. Fur the marketing 
mix. promotions exert stronger within- as well as cross-category effects than price changes. 

To get more insight in the sign of the iruxs-eategury effec ts, we summarize each 1RF by 
the sum of the lirst 10 responses, and average this number over all stores. Table[7]rc|>orts 
the live largest positive and negative cross-category effects of price, promotion and sales on 
sales. 
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Table 7: Cross-category price, promotion ami sales effects on sales summed across 10 lags 
ol IHFs and averaged across stores. We present only the five largest positive and negative 
effects. 


Cross-category price effects 


Price 

Sales 

Effect 

Price 

Sales 

Effect 

itllptllse 

response 


impulse* 

response 


Perceived c<nnjdement s 


PeneivY'i 1 sul*tt i 1 ul es 


Sufi Drinks 

Camus 1 Tuna 

•0.0209 

FVont-end-candte 

Bottled Juices 

(14)120 

Sufi Drinks 

FVooen Entrees 

•04)182 

Soft Drinks 

FVuxen Juices 

0-0060 

Canned Tuna 

Camus 1 Soup 

•04)173 

Smirk Crackers 

Beer 

00058 

Cereals 

FVooen Dinueis 

•0.0104 

Cookies 

Oatmeal 

0.0056 

Bottled Juices 

Crackers 

•0.0074 

FVuxen Juices 

Bottled Juices 

0.0023 

Cross-category promotion effects 

Piomotiou 

Sales 

Effect 

Pimnot k>n 

Sales 

Effect 

impulse 

rrv]>oaise 


impute 

response 


Bottled Juicxv 

FVoaen Entrees 

0.0586 

Oatmeal 

Canned Tina 

•0.0214 

Cbtnn 

FYoaen Entrees 

0.0421 

Chroa 

Coukire 

•0.0160 

Crackers 

FYcxeon Entrees 

0.0240 

Bottles Juicr* 

Canned Tuna 

•0.0158 

FVuzeti Diuneis 

Fnxtt'n Entrees 

0.0170 

Refrigerated Juices 

C'anueil Tina 

•0.0128 

Snack Crackers 

FrtXK'n Entrees 

0.0127 

Cereals 

Chmn 

•0.0127 

Cross-category sales effects 

Sate 

Sales 

Effect 

Sales 

Sales 

Effect 

impulse 

re*|>onsc 


impulse 

response 


Front cmlc audit* 

Soft Drinks 

0.0191 

Smuk Crackers 

Oatmeal 

•0.0154 

Oatmeal 

Frown EnltW 

0.0123 

FVuxeen Juices 

FVuxen Entires 

•0.0120 

Caiimsl Tuna 

Cracker* 

0.0094 

Cereals 

FVuxen Dinners 

•0.0099 

Front - end*( audit* 

Brer 

0.0080 

Snack Crackrrs 

Cookies 

•0.0087 

Snack Crackers 

Frown Dinners 

0.0094 

Refrigerated Juices 

Canned Tuna 

•0.0084 
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Slore 1 


Store 2 


Store 3 



Lag Lag lag 


Figure 4: Impulse response function: response of frozen juices sales growth to a one standard 

deviation impulse in the price of soft drinks. 


Cross-category price effect h. We investigate whether consumers perceive categories -ls 
complements or its substitutes. Complementary ami substitution effects occur between cat¬ 
egories because they are consmued together ur separately. Following the sbuulard economic 
definition | |l*ashiginilj |1998[ i. complements are defined iv- goods having a negative cross-price 
elasticity, whereas substitutes are defined <ts guuds having a positive cross-price elasticity. 
We find evidence of two important drivers of cross-category price effects: consumption rc- 
latedness and tin- budget constraint. 

As an example of consumption lelatedness, consider Soft Drink prices and Frozen .Juices. 
An increase in Soft Drink prices makes consumers spend more on other drinks as a compen¬ 
sation, in particular Frozen Juices (see Taiile [5J. The joint dynamic effect of a one standard 
deviation price impulse of Soft Drinks on the sales response growth of Frozen Juices is de¬ 
puted in FigureQ|lor the first three stores in the data set. Note that the instantaneous effect 
Is estimated as exactly zero since the Sparse VAR puts the corresponding effect in the II 
matrix to zero. We see a sharp increase in Frozen Juice’s sales growth one week after the soft 
drink price increase, indicating substitution. However, the next two weeks, s.des growth of 
Frozen Juices slows down, which could indicate stockpiling behavior i ](Jangwar et .d| |2t)Ti) . 
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Another example of consumption reluteduess is Soft Drinks and Frozen Entrees. As can 
be seen from Table [7J we find a strong negative effect of Soli Drink prices on Frozen Entree’S. 
This might be clue to the fact that Soft Drinks and Frozen Entrees arc consumed together. 
We do not find the opposite effect uf price changes in Frozen Entrees on the sales of Soft 
Drinks. This asymmetry arises because Soft Drink is a destination cat egory (high inlluence), 
while FVozeu Entrees is a convenience category (highly respunsivinejfi). 

Concerning the budget constraint, prominent cross-category price effects arc observed for 
Soft Drinks ami ('ereaLs. both destination categorius. Soft Drinks and Cereals account for a 
relatively large proportion of the expenditures of US families (respectively 22% and 14*/f of 
spending on food and drink. see Table[S), which imlicales that the budget constraint is an 
important source of cross-category effects. 

Cross-cntcgoiy pwmolian effects. The results in Tabic [7] indicate t hat branding and 
promotion intensity are important drivers of cross-category promotion effects. Concerning 
branding, crosi-category promotion effects are olwervcd for categories that share brands such 
as Frozen Dinners and Frozen Entrees (e.g. the frozen prepared foods brand “Stuulfer's"). 
Concerning promotion intensity, prominent cross-category promotion effects are observed 
fur categories in which a high percentage uf the SKUs Is |»rumuted. such as Cheeses .uid 
Buttled Juices (respectively 28% .uid 2G , / ( of SKUs, on average, arc promoted iii uur data.) 
A promotion impulse in such categories might either trigger join consumption (e.g. Bottled 
Juices and Frozen Entrees), or deter consumption (e.g. Cheeses and Cookies). 

Cross-category suhs effects. In Table (7j we lind evidence ol two important drivers of 
cruis-ealegory effects ul sides on sales: affinity in consumption and the budget constraint. 
Prominent cross-category sales effects occur because of affinity in consumption. Some cat¬ 
egories are jointly consumed t owonls a common gu.d. such as Fruul-euil-caiuliis mid Soft 
Drink/Beer (for a light meal); while others such as Snack Crockets and Cookies arc pur¬ 
chased as replacements since consumers might perceive them to have a similar functionality. 
Concerning the budget constraint, we had some cross-category sabs effects I >c tween seem¬ 
ingly unrelated categories such as Refrigerated JuicLs mid Conned Tuna. 

Importantly, the results from Table [7| are in line with our findings on category inlluence 
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ami responsiveness. Destination categories such as Soft Drinks. Cereals and Cheeses mainly 
influence sales in other categories through their price, promotion or sales impulses. Conve¬ 
nience categories such as Frozen Entrees and Frozen Dinners are more responsive to changes 
in other categories. Routine categories, such as Cookies, are moderately influential and 
moderately responsive, while occasional categories, such as Oatmeal, are highly responsive. 

6.3 Robustness checks 

Alternative penalty function. We investigate the robustness of the results to the choice of 
the penalty function. We re-estimate the models using the Sjwrse VAR with elastic net 
instead of the gronplasso penalty (a short explanation of the elastic net is given in Section 
4). The managerial insights obtained by Sparse VAR with either gronplasso or elastic net 
are very similar. Similarities are that (i) within-category effects are more common and larger 
in magnitude t han cross-category effects, (ii) destination categories such as Cheeses and Co- 
reals are very influential, (iii) convenience categories such as Frozen Entrees, and occasional 
categories such as Crackers are very res|M>nsive (iv) routine categories such as Bottled .luices. 
Refrigerated Juices and Cookies are both influential and responsive (v) the most prominent 
cross-category effects of price, promotion and sales on sales are highly overlapping. 

Alternative data period. We also check the performance of the Sparse VAR on the post- 
1001 data. Retailers made extensive use of “pay-for-performance” price promotions that 
are not fully reflected in the Dominick's database. The data generating process might have 
changed in this period. Therefore, we should not assume constant parameter values. We 
re-estimate the model on the post-1004 data (data from October 1005 until May 1007) and 
verify its performance. In the past-1004 ]>eriod, similar conclusions can 1 m- drawn with respect 
to within versus cross-category effects and category influence and responsiveness. Some 
differences are ol>served in t he 1 post-1004 period concerning the impulse response functions. 
These differences occur due to an altered strategy concerning average pricing and promotion 
intensity in the 17 product categories in t he past-1004 period compared to the 1003-1004 
period. Detailed results are available from the authors upon request. 

Alternative sparsity parameter selection. Our results are based on the B1C to select the 1 
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Table S: Mt-an Absolute Forecast Error (MAKE) fur eat ogory-spedlic sales, averaged over 
llie 15 .stores and the 17 product categories, /'-values of a Diebokl-Mariano test comparing 
the Sparse VAR to its alternatives are indicated between parentheses. 


Sparse VAR 


Restricted LS 

Bnyrainn Methods 


LS 

1-tit up Iterative 

Minnesota NIW 

MAKE TZICi.rttl 

1208.54 
(< 0.111) 

784-fKi 734.82 

(<0.01) 

875.47 1078.03 

(<00l) <*UUll| 


penalty parameters. We also ran the analysis using AH’ as a selection criterion for the 
penalty function. While the model selected by AIC are slightly less sparse, the substantive 
insights do not change. 


(i.4 Forecast Performance 


Although prediction is not tin' main goal of t lie proposed methodology. we deem it ini|>orlant 
to allow that the Sparse VAR can compete with other methods in terms of prediction accu¬ 
racy. We estimate model jTJl fur each store and perform a forecast, exercise (efr. Section 4), 
using <i rulliug window of length S (i7. One-step-ahead fureeasts of sales lor each product 
category are computed fur I S,...,T 1. with T 77. The sameeslinwlion methods as 

in Section 4 are used. 

Results on the sales predic tions are summarized in Tahle|5|by t lie Mean Absolute Forecast 
Error (MAKE), averaged across time and over the 17 product categories and 15 stores. The 
MAKE should be seen as a measure of forecast accuracy, nut as a measure of managerial 
relevance of the obtained results. The variable selection metliods Sparse VAR. 1-step and 
Iterative Restricted LS perform, on average, better than the met buds that don't perform 
variable selec tion. This indicates that sparsity improves prediction accuracy. Sparse VAR 
and Iterative Restricted LS achieve the best forecasting |>erformance. A Diebold-Mariano 
t( ' | ]Diebold .md Maiiano| |1 W5[ l confirms that latter two methods significantly outperform 
tlie other methods. We conclude that the improvement in iuterprelability uf the model 
obtained by Sparse VAR. as discussed in the previous section, docs not come at the eusl of 
lower forecast performance. 


27 










7 Discussion 


This paper presents a Sparse VAR methodology to delect the iutcr-relatioushi|» in a large 
product category network. In the cross-category ileinHinl effects application, we detect an 
important number of croan-calegury demand effects for a large number of categories. We find 
that categories have asymmetric rules: While destination categories are mure inlluenlial. con¬ 
venience categories are more responsive. We identify main perceived crom-catcgory elicits 
but also detect croKf-calcgorv effects between categories that are not directly related at lirst 
siglit. Hence, the need to study potentially a large number of product categories simul¬ 
taneously. While cross-category effects are prevalent, many of them are still absent, calling 
for a sparse estimation procedure that succeeds in highlight ing the main inter-relatiausliips 
in the product category network. 

V\ r e identify category inlluence ami responsiveness in our truss-category dcmiuul elicits 
application using aggregate store level data. Other cross-category studies, such as [Russell 
[and Kamakura]|Ainslic and R7^ ( ]lW)S[ i: [Kii-sscll et al^ i j 1 |Huss<-ll and Petersen 
| ]2I>(K>( |: [Elrod et alT] | ]2I>)2[ | use luarket basket data. Since the availability and use of such 
market, basket data pose difficulties to managers, they rarely use market basket data fur 
category analysis | )Sliaukiu mid Kanu.ui| 201-l[ |. As managerial decisions are often made at 
the category level, managers prefer to work with more readily available aggregate store level 
data. Hence, using aggregate category store level is immagerially relevant l ]Ai!awadi et al. 
p00{ jl .<vll>uig and |M2l l. 


A liist limitation of our approach is that we use aggregate category data, which might lead 
tu biased estimates when there is heterogeneity uu the SKI! level | ]Dekiiupe and lhuissrus[ 
|2<l)l)ft . Second, uur model does nut allow to estimate cross-category effects on the individual 
consumer levcL Insights into the behavior of consumers are revealed using market basket 
data, which requires a very different modeling approach. Despite these limitations, aggregate 
category data are highly relevant from the perspective of category management within the 


store. 

An important advantage of the Sparse VAR is that it overcomes the dimensionality 
problem it results in a parsimonious model with minimal structural constraints. We show 















































llml this leads to more accurate estimation and predict ion results as compared to standard 
Least Squares met hods. If the researcher wishes to restrict some of the ]>ar.uiieters to zero 
a priori, using marketing theory, this is of course* still i>ossil>lo to implement with the Sparse 
VAR. The same liohis lor the reverse, i.e. furring some variables to l»e included in tin* model, 
which can be done by adjusting the penalty on the regression coefficients irr ijil}. 

The methodology presented in this paper is relevant in a variety of other sot lings. First, 
Sparse VAR r an be used to study competitive demand effects across many competitors. 
The VAR is ideal for measuring competitive effects since it is able to capture own- and 
cro&J-elnsticily of sales to both pricing and marketing spending < ]Srmivns<ui et al.[ |2t)0-l[ 
Horvath el al.[|2005) . Typically only three competitors are included in sudi studies, while 
using the Sparse VAR allows fur a much larger number to be included. Seeond, in the field 
of international marketing research there is an increased interest in studying cross-country 
spill-over effects, as for example in [Albuquerque et al. 
and |Kumar andl 




Iman 


■ [van Everdiugen cl ul.| ( |200!)fr 
]2(K12| . Every country that Ls added to the data set leads to an 
increase in the number of cross-country parameters to be estimated. Using the proposed 
met hodology, a large VAR model could be built which allows spill-over effects between many 
countries. Finally, the Market Response Model could be extended with data on online word 
of mouth or online search, winch are now readily available. Especially in the Llig Data era. 
most companies collect an abundance of variables ^('biiitagimta et al.][2f)13p . such that large 
VAR models will become even larger as more granular data become available. 
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Appendix A Penalized Likelihood Estimation 

We iteratively solve the minimization problem |[3} for 3 condilional oil Ji and then for 0 
conditional on i. 
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Solving forii\Q: When 12 is fixed, the minimization problem in (fill is equivalent to minimizing 


m 


= «rg,ui,.i( y - Xm - Xfi) + A, jh 11,411,, 

* u-1 


(A.l> 


where y = Pg. X PX. uud P is a matrix such that P'P = 12. The transformation of the 
data to v and A' ensures that the resulting model has uncorrelated and homoscedastic error 
terms. The above minimization problem is convex if 1! is uonuegatlve definite. The miui- 
mizaliun problem is equivalent to the groupwise huso of | Yuan <uid Linj i ]200ti} , implemented 
in the R package grplasso i )Meiei [|20t)9) . 

Solving for $2|$: When i Ls fixed, the minimization problem in ijl}i reduces to 
12|,i : urgiiiin — (// -V.iJ'Slf// A'd) - log |12| + Aj Y' |!2 U .| . 

r* ll * 


n 


(A.2) 




which corresponds tu penalized euvariauce estimation. Using the g lasso algorithm of |Fried¬ 
man et al. ( pflOSp . available in the R package glasso l |Frie<lman et al.[(2tmj l. t he opt imizat ion 
problem in i |A .2[ us solved. 

Wo start the algoritlun by taking 12 = / v and iterate until convergence. We iterate until 
d.,, i| < i, with 4.,. the s ,h parameter estimate in iteration i (same lor 1 1) and 
the tolerance c set tu 10 J . 


Selecting the Sparsity Parameters and the urder of the VAR We first determine 
the optimal values uf Ai and A. for a fixed value of p, the order of the VAR. The sparsity 
parameters Ai and A.- are selected according to a minimal Bayes Information Criterion {BIC}. 
In the iteration step where ;i is estimated conditional on 12. we solve 1 |A.1| over a range of 
values for Ai and select the one with lowest value of 

BICx t = -2 log Lx, + kx, log(n), (A-3) 

where L\ t is the estimated likelihood, corresponding to tin- first term in | |A. lfr , using sparsity 
parameter A,. Furthermore. k x , Ls the number of nun-zero estimated regression coefficients 
and n the number of observations. Similarly, fur selecting Aj. we use the BIC given by 

BICxz = -2 log L Xl + kx, log(w). (A.4) 
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Finally, we select the order p of the VAR. We estimate the VAR for different values of />. 

The optimal values of A, and A. are determined for a each of those values of p. We select 

the order p of the VAR using BIC: 

( = “ 2 L(pM(p)M<pH + log(n) > ( A - 5 ) 

where ^ip.x,<p).x 2 (p)t aI| d ^(pAiOO-'aGO) depend on the value p and t he optimally chosen values 

of Ai(p) aial Aj(/>) for t hat specific value of p. 
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